Learning: Emissions Data - Table, Map, and Chart

Author

Luis Laban

Introduction

In this exercise, you will work with a CO2 emissions dataset downloaded from Gapminder and produce a report with three tabs: a data table, a line chart, and a choropleth map.

The goal is to roughly replicate the Our World in Data visualization page on consumption-based CO2 emissions.

Be sure to view that page to get an idea of the final product.

Setup

  • You should have forked and cloned this repository to your local machine.

  • Now, create and select a virtual environment in VSCode.

  • Install the following packages:

    • pandas
    • plotly
    • itables
    • ipykernel
    • jupyter
    • country_converter
  • Download the data from Gapminder by selecting: Environment > Emissions > CO2 Total emissions, then downloading the CSV file into a data folder in your repository.

Data Import

Run the following code to import the necessary libraries:

import pandas as pd
import numpy as np
import plotly.express as px
from itables import show
import country_converter as coco

Load in your dataset from gapminder below. View it in your data viewer to get an idea of the structure.

# Load the data
emissions = pd.read_csv(r"C:\Users\luisl\Documents\GitHub\co2_emissions_demo\data\co2_cons.csv")
emissions.columns
Index(['country', '1800', '1801', '1802', '1803', '1804', '1805', '1806',
       '1807', '1808',
       ...
       '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
       '2022'],
      dtype='object', length=224)

Initial Cleaning

In this dataset, some values are given in thousands, with a “k” used to represent the thousands. This will cause problems when we try to make these columns numeric. So we need to clean this. We’ll do this for you, but pay close attention as you might need it for your final project.

First, let’s see the issue:

emissions.query("country == 'China'")[["country",  "2020", "2021", "2022"]]
country 2020 2021 2022
32 China 9980.0 10.6k 10.5k

Notice the letter “k” at the end of “10.6k” as an example.

We can remove the “k” and multiply those values by 1000 with the following code:

for col in ["2021", "2022"]:
    has_k = emissions[col].str.contains("k")
    values = emissions[col].str.replace("k", "")
    emissions[col] = np.where(has_k, values.astype(float) * 1000, values.astype(float))

And check that it worked:

emissions.query("country == 'China'")[["country",  "2020", "2021", "2022"]]
country 2020 2021 2022
32 China 9980.0 10600.0 10500.0

Table Section

Our goal is to create a table showing emissions for a few selected years and calculate absolute and relative changes.

  1. Subset the data to include Country, 2000, and 2022 columns only.
  2. Calculate an “Absolute Change” column (emissions_2022 - emissions_2000).
  3. Calculate a “Relative Change” column ((emissions_2022 - emissions_2000) / emissions_2000).
# Subset the data to include `country`, `2000`, and `2022` columns only.
table_df = emissions[["country", "2000", "2022"]].copy()

# Calculate absolute change as the difference between 2022 and 2000
table_df["Absolute Change"] = table_df["2022"] - table_df["2000"]

# Calculate relative change as the absolute change divided by the 2000 emissions, then multiplied by 100
table_df["Relative Change"] = (table_df["Absolute Change"] / table_df["2000"])*100

# Round to 0 decimal places, and add a % sign to the relative change
table_df["Relative Change"] = table_df["Relative Change"].round(0).astype(str) + "%"

Now we can display this as an interactive table with itables:

show(table_df)
country 2000 2022 Absolute Change Relative Change
Loading ITables v2.2.4 from the internet... (need help?)

Chart Section

Our goal is to create a line chart from 1990 to 2022 for a few selected countries.

  1. Melt the data so that years become rows.
  2. Filter from 1990 to 2022 only.
  3. Choose 5 countries of your choice.
  4. Create a line chart with Plotly Express.
# Melt the dataset. Your id_vars should be "country", your var_name should be "year" and your value_name should be "emissions".
emissions_long = emissions.melt(
    id_vars="country", var_name="year", value_name="emissions"
)

# Convert year to numeric using pd.to_numeric
emissions_long["year"] = pd.to_numeric(emissions_long["year"])
emissions_long

# Convert emissions to numeric using pd.to_numeric
pd.to_numeric(emissions_long["emissions"].astype(str).str.replace("−", "-"))



# Query for years between 1990 and 2022 (that is 1990, 1991, ..., 2022)
emissions_long_1990_2022 = emissions_long.query("1990 <= year <= 2022")

# Query for 5 countries (adjust these to any countries you like)
countries_of_interest = emissions_long_1990_2022.query(
    "country in ['Peru', 'Argentina', 'Australia', 'Nigeria', 'España']"
)

# Create line chart. Year should be on the x-axis, emissions on the y-axis, and color should be by country.
fig_chart = px.line(countries_of_interest, x="year", y="emissions", color="country")
fig_chart

Evaluate missing values

emissions_long.isna().sum()
country      0
year         0
emissions    0
dtype: int64

Mapping Section

This part is done for you.

Goal: Create a choropleth map showing global emissions from 1990 to 2022.
This will be animated by year.

  1. Ensure each country has a 3-letter ISO code. We’ll use country_converter for that.
  2. Create a map with px.choropleth and use animation_frame to show changes over time.
# Convert country names to ISO3 codes
emissions_long_1990_2022["country_code"] = coco.convert(
    emissions_long_1990_2022["country"], to="ISO3"
)

fig_map = px.choropleth(
    emissions_long_1990_2022,
    locations="country_code",
    color="emissions",
    hover_name="country",
    animation_frame="year",
    title="Global CO2 Emissions (1990-2022)",
)

fig_map.show()

Dynamic tabset

Below, Results into a tabbed interface.

show(table_df)
country 2000 2022 Absolute Change Relative Change
Loading ITables v2.2.4 from the internet... (need help?)
fig_chart.show()
fig_map.show()
# Query for 5 countries (adjust these to any countries you like)
countries_of_interest = ["Argentina", "Peru", "Bolivia", "Ecuador", "Brazil"]
country_data_dict = {}
for country in countries_of_interest:
    country_data = emissions_long_1990_2022.query("country == @country")
    country_data_dict[country] = country_data

# Print each country's data
for country, data in country_data_dict.items():
    print(f"Data for {country}:")
    print(data)
    print("\n")
Data for Argentina:
         country  year emissions country_code
36865  Argentina  1990     111.0          ARG
37059  Argentina  1991     120.0          ARG
37253  Argentina  1992     127.0          ARG
37447  Argentina  1993     122.0          ARG
37641  Argentina  1994     126.0          ARG
37835  Argentina  1995     129.0          ARG
38029  Argentina  1996     136.0          ARG
38223  Argentina  1997     143.0          ARG
38417  Argentina  1998     146.0          ARG
38611  Argentina  1999     148.0          ARG
38805  Argentina  2000     140.0          ARG
38999  Argentina  2001     129.0          ARG
39193  Argentina  2002      81.7          ARG
39387  Argentina  2003       112          ARG
39581  Argentina  2004       133          ARG
39775  Argentina  2005       140          ARG
39969  Argentina  2006       150          ARG
40163  Argentina  2007     156.0          ARG
40357  Argentina  2008     172.0          ARG
40551  Argentina  2009     157.0          ARG
40745  Argentina  2010     172.0          ARG
40939  Argentina  2011       189          ARG
41133  Argentina  2012       189          ARG
41327  Argentina  2013       198          ARG
41521  Argentina  2014     184.0          ARG
41715  Argentina  2015     199.0          ARG
41909  Argentina  2016     186.0          ARG
42103  Argentina  2017     190.0          ARG
42297  Argentina  2018     175.0          ARG
42491  Argentina  2019     166.0          ARG
42685  Argentina  2020     158.0          ARG
42879  Argentina  2021     171.0          ARG
43073  Argentina  2022     175.0          ARG


Data for Peru:
      country  year emissions country_code
36996    Peru  1990      23.7          PER
37190    Peru  1991      23.5          PER
37384    Peru  1992      23.9          PER
37578    Peru  1993      26.4          PER
37772    Peru  1994      26.7          PER
37966    Peru  1995      28.0          PER
38160    Peru  1996      27.4          PER
38354    Peru  1997      30.5          PER
38548    Peru  1998      31.4          PER
38742    Peru  1999      32.7          PER
38936    Peru  2000      32.2          PER
39130    Peru  2001      29.9          PER
39324    Peru  2002      30.9          PER
39518    Peru  2003      29.3          PER
39712    Peru  2004      33.1          PER
39906    Peru  2005      37.4          PER
40100    Peru  2006      32.3          PER
40294    Peru  2007      39.6          PER
40488    Peru  2008      49.9          PER
40682    Peru  2009      53.0          PER
40876    Peru  2010      61.7          PER
41070    Peru  2011      57.6          PER
41264    Peru  2012      57.8          PER
41458    Peru  2013      57.3          PER
41652    Peru  2014      61.5          PER
41846    Peru  2015      61.6          PER
42040    Peru  2016      62.2          PER
42234    Peru  2017      62.4          PER
42428    Peru  2018      64.4          PER
42622    Peru  2019      66.6          PER
42816    Peru  2020      57.3          PER
43010    Peru  2021      66.3          PER
43204    Peru  2022      68.3          PER


Data for Bolivia:
       country  year emissions country_code
36882  Bolivia  1990      9.07          BOL
37076  Bolivia  1991      9.56          BOL
37270  Bolivia  1992      10.2          BOL
37464  Bolivia  1993      10.7          BOL
37658  Bolivia  1994      11.4          BOL
37852  Bolivia  1995      11.2          BOL
38046  Bolivia  1996      11.5          BOL
38240  Bolivia  1997      10.7          BOL
38434  Bolivia  1998      11.0          BOL
38628  Bolivia  1999      11.0          BOL
38822  Bolivia  2000      8.73          BOL
39016  Bolivia  2001       8.4          BOL
39210  Bolivia  2002      9.02          BOL
39404  Bolivia  2003      9.42          BOL
39598  Bolivia  2004      9.22          BOL
39792  Bolivia  2005      9.94          BOL
39986  Bolivia  2006      10.3          BOL
40180  Bolivia  2007      11.4          BOL
40374  Bolivia  2008      12.5          BOL
40568  Bolivia  2009      13.1          BOL
40762  Bolivia  2010      13.5          BOL
40956  Bolivia  2011      15.2          BOL
41150  Bolivia  2012      16.3          BOL
41344  Bolivia  2013      17.9          BOL
41538  Bolivia  2014      20.1          BOL
41732  Bolivia  2015      21.4          BOL
41926  Bolivia  2016      22.8          BOL
42120  Bolivia  2017      23.5          BOL
42314  Bolivia  2018      23.6          BOL
42508  Bolivia  2019      23.3          BOL
42702  Bolivia  2020      21.7          BOL
42896  Bolivia  2021      23.5          BOL
43090  Bolivia  2022      24.3          BOL


Data for Ecuador:
       country  year emissions country_code
36910  Ecuador  1990      16.9          ECU
37104  Ecuador  1991      16.4          ECU
37298  Ecuador  1992      20.9          ECU
37492  Ecuador  1993      23.5          ECU
37686  Ecuador  1994      15.0          ECU
37880  Ecuador  1995      22.6          ECU
38074  Ecuador  1996      23.2          ECU
38268  Ecuador  1997      19.4          ECU
38462  Ecuador  1998      24.6          ECU
38656  Ecuador  1999      21.4          ECU
38850  Ecuador  2000      19.5          ECU
39044  Ecuador  2001      24.2          ECU
39238  Ecuador  2002      26.8          ECU
39432  Ecuador  2003      28.6          ECU
39626  Ecuador  2004      30.8          ECU
39820  Ecuador  2005      32.6          ECU
40014  Ecuador  2006      33.7          ECU
40208  Ecuador  2007      38.2          ECU
40402  Ecuador  2008      35.4          ECU
40596  Ecuador  2009      36.6          ECU
40790  Ecuador  2010      39.3          ECU
40984  Ecuador  2011      42.8          ECU
41178  Ecuador  2012      42.3          ECU
41372  Ecuador  2013      45.4          ECU
41566  Ecuador  2014      49.2          ECU
41760  Ecuador  2015      47.1          ECU
41954  Ecuador  2016      43.7          ECU
42148  Ecuador  2017      45.5          ECU
42342  Ecuador  2018      45.0          ECU
42536  Ecuador  2019      46.7          ECU
42730  Ecuador  2020      40.6          ECU
42924  Ecuador  2021      47.1          ECU
43118  Ecuador  2022      48.6          ECU


Data for Brazil:
      country  year emissions country_code
36883  Brazil  1990     244.0          BRA
37077  Brazil  1991     251.0          BRA
37271  Brazil  1992     253.0          BRA
37465  Brazil  1993     261.0          BRA
37659  Brazil  1994     274.0          BRA
37853  Brazil  1995     296.0          BRA
38047  Brazil  1996     320.0          BRA
38241  Brazil  1997     340.0          BRA
38435  Brazil  1998     347.0          BRA
38629  Brazil  1999     331.0          BRA
38823  Brazil  2000     347.0          BRA
39017  Brazil  2001     340.0          BRA
39211  Brazil  2002     329.0          BRA
39405  Brazil  2003       322          BRA
39599  Brazil  2004       334          BRA
39793  Brazil  2005       348          BRA
39987  Brazil  2006       357          BRA
40181  Brazil  2007     392.0          BRA
40375  Brazil  2008     442.0          BRA
40569  Brazil  2009     412.0          BRA
40763  Brazil  2010     484.0          BRA
40957  Brazil  2011       518          BRA
41151  Brazil  2012       552          BRA
41345  Brazil  2013       589          BRA
41539  Brazil  2014     610.0          BRA
41733  Brazil  2015     546.0          BRA
41927  Brazil  2016     485.0          BRA
42121  Brazil  2017     496.0          BRA
42315  Brazil  2018     475.0          BRA
42509  Brazil  2019     475.0          BRA
42703  Brazil  2020     423.0          BRA
42897  Brazil  2021     443.0          BRA
43091  Brazil  2022     454.0          BRA

for country in countries_of_interest:
    country_data = emissions_long_1990_2022.query("country == @country")
    fig = px.line(
        country_data, x="year", y="emissions", title=f"Emissions for {country}"
    )
    fig.show()